getwd()Introduction to R and RStudio
This lesson introduces participants to the R programming environment and the RStudio interface. Participants will learn how to navigate RStudio, create and manage R projects, work with scripts, use the console and understand core R concepts such as variables, functions and the assignment operator. The lesson also guides participants through organizing project directories, interacting with data and adopting recommended practices for reproducible workflows in R.
Approximate time: 30 minutes
Learning Objectives
- Describe what R and RStudio are.
- Interact with R using RStudio.
- Familiarize various components of RStudio.
- Employ variables in R.
What is R?
The common misconception is that R is a programming language but in fact it is much more than that. Think of R as an environment for statistical computing and graphics, which brings together a number of features to provide powerful functionality.
The R environment combines:
- Effective handling of big data
- Collection of integrated tools
- Graphical facilities
- Simple and effective programming language
Why use R?
R is a powerful, extensible environment. It has a wide range of statistics and general data analysis and visualization capabilities.
- Data handling, wrangling, and storage
- Wide array of statistical methods and graphical techniques available
- Easy to install on any platform and use (and it’s free!)
- Open source with a large and growing community of peers
Examples of R used in the media and science
- “At the BBC data team, we have developed an R package and an R cookbook to make the process of creating publication-ready graphics in our in-house style…” - BBC Visual and Data Journalism cookbook for R graphics
- “R package of data and code behind the stories and interactives at FiveThirtyEight.com, a data-driven journalism website founded by Nate Silver (initially began as a polling aggregation site, but now covers politics, sports, science and pop culture) and owned by ESPN…” - fivethirtyeight Package
- Single Cell RNA-seq Data analysis with Seurat
What is RStudio?
RStudio is freely available open-source Integrated Development Environment (IDE). RStudio provides an environment with many features to make using R easier and is a great alternative to working on R in the terminal.
- Graphical user interface, not just a command prompt
- Great learning tool
- Free for academic use
- Platform agnostic
- Open source
Let’s create a new project directory for our “Introduction to R” lesson today.
- Open RStudio
- Go to the
Filemenu and selectNew Project. - In the
New Projectwindow, chooseNew Directory. Then, chooseNew Project. Name your new directoryIntro-to-Rand then “Create the project as subdirectory of:” the Desktop (or location of your choice). - Click on
Create Project.
- After your project is completed, if the project does not automatically open in RStudio, then go to the
Filemenu, selectOpen Project, and chooseIntro-to-R.Rproj. - When RStudio opens, you will see three panels in the window.
- Go to the
Filemenu and selectNew File, and selectR Script. - Go to the
Filemenu and selectSave As..., typeIntro-to-R.Rand selectSave
The RStudio interface should now look like the screenshot below.
What is a project in RStudio?
It is simply a directory that contains everything related your analyses for a specific project. RStudio projects are useful when you are working on context- specific analyses and you wish to keep them separate. When creating a project in RStudio you associate it with a working directory of your choice (either an existing one, or a new one). A . RProj file is created within that directory and that keeps track of your command history and variables in the environment. The . RProj file can be used to open the project in its current state but at a later date.
When a project is (re) opened within RStudio the following actions are taken:
- A new R session (process) is started
- The .RData file in the project’s main directory is loaded, populating the environment with any objects that were present when the project was closed.
- The .Rhistory file in the project’s main directory is loaded into the RStudio History pane (and used for Console Up/Down arrow command history).
- The current working directory is set to the project directory.
- Previously edited source documents are restored into editor tabs
- Other RStudio settings (e.g. active tabs, splitter positions, etc.) are restored to where they were the last time the project was closed.
Information adapted from RStudio Support Site
Adding directories
In order to add a directory, or folder, within your RStudio project, you can click the “Create a new folder” button at the top-left of the Files/Plots/Packages/Help window. It looks like a folder with a plus sign on top of it. Let’s name the directory scripts and click “OK”.
Next, we are going to add the data directory:
- This directory can be downloaded from here. Right-click on this link and select “Save Link As…”.
- Navigate to your RStudio project and save the file.
- Go to the file in your file browser and double-click on the
data.zipfile to uncompress it.
Windows users will need to check that within the data folder, there isn’t a second data directory within the data directory. If there is, bring the nested data directory out of the data directory and place it directly within your RStudio project.
The RStudio interface should now look like this:
Next, we are going to create an Rscript to keep a record of our work, but first we will implement good data management practices and create a folder to hold this script. In the bottom-right panel there should be a button that has a folder with a green plus on it and says “New Folder”, click this button:
Depending on the size of your window, the “New Folder” button may just have the icon or the icon and the word “Folder”.
A window should pop-up prompting you to provide a name for the folder. Type “scripts” into the text box and click “OK”:
Now in the bottom-left panel, you will see a “scripts” folder.
Next, we will create the Rscript that we will write our code in. In order to create an Rscript, click on the “File” menu option in the top-left, then “New File” and then select “R Script”.
This will create an R Script as the top panel on the left side of your RStudio window.
Before we go any further, let’s save our new R Script by following the steps listed below:
- Click on the “File” menu option in the top-left and find “Save As…”.
- You will see a finder window pop-up showng you working directory. Click the “scripts” folder.
- Let’s name our R Script as “Intro-to-R.R” in the text field and click on the “Save” button
RStudio Interface
The RStudio interface has four main panels:
- Console: This is where you can type commands and see output. The console is all you would see if you ran R in the command line without RStudio.
- Script editor: This is where you can type out commands and save to file. You can also submit the commands to run in the console.
- Environment/History: Environment shows all active objects and history keeps track of all commands run in console
- Files/Plots/Packages/Help: “Files” shows a file browser and “Plots” will populate when you create a plot. “Packages” will help you manage packages from CRAN and Bioconductor. “Help” holds manuals to the functions within R.
Organizing your working directory & setting up
Viewing your working directory
Before we organize our working directory, let’s check to see where our current working directory is located by typing into the console:
Your working directory should be the folder where the R project is located on your computer. The working directory is where RStudio will automatically look for any files you bring in and where it will automatically save any files you create, unless otherwise specified.
You can visualize your working directory by selecting the Files tab from the Files/Plots/Packages/Help window.
If you wanted to choose a different directory to be your working directory, you could navigate to a different folder in the Files tab, then, click on the More dropdown menu which appears as a Cog and select Set As Working Directory.
To organize your working directory for a particular analysis, you should separate the original data (raw data) from intermediate datasets. For instance, you may want to create a data/ directory within your working directory that stores the raw data, a scripts/ directory for your R scripts, a results/ directory for intermediate datasets and a figures/ directory for the plots you will generate.
We have provided you with R project containing the data/ directory and we made our scripts/ directory together. For this exercise create the results/ and figures/ directories.
When finished, your working directory should look like this:
Setting up
This is more of a housekeeping task. We will be writing long lines of code in our script editor and want to make sure that the lines “wrap” and you don’t have to scroll back and forth to look at your long line of code.
Click on “Code” at the top of your RStudio screen and left-click “Soft Wrap Long Lines” in the pull down menu.